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To all whom it may concern: 

Be it known that we, Youssef (NM) Drissi, Moon Ju Kim, Lev (NMI) Kozakov and Juan 
(NMI) Leon Rodriguez, citizens of Morocco, United States of America, Israel and Mexico, 
35 respectively, residing in the states of New York, New York, Connecticut and New York, 
respectively, have invented new and useM improvements in 

METHOD AND SYSTEM FOR SEARCHING A MULTI-LINGUAL 

DATABASE 

40 

of which the following is a SPECIFICATION: 
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METHOD AND SYSTEM FOR SEARCHING A MULTI-LINGUAL 

DATABASE 

Background of the Invention 

5 

Field of the Invention 

The present invention relates to the field of searching a database using search term(s) 
entered by a user. More particularly, the present invention is a system and method for searching 
10 on a database including material in different languages where the search term(s) are entered in 
one of the languages where the database need not be translated into the different languages. 

1 

| Background Art 

I 

! Various methods have been proposed for searching a database wherein the database 

15 includes material in multiple languages. One approach is to translate the entire database into the 

language in which a search term is entered or the language of the user. However, this could 

\ involve a large amount of translation for a sizable database (and multiple translations if the 

database is used by users in different languages). Further, each process of translating a document 

has the potential for losing (or distorting) some of the meaning of the original text 

20 For these reasons, it is desirable to avoid translating the documents to allow for a search 

in a particular language. 

Another approach is to use synonym list and apply it to the search term(s) entered in one 

language. That is, the text of the documents in the database remain in the original language and 

synonyms in each language for each search term(s) are used for the search of the database. This 

25 system may work in some cases but is undesirable in other cases because considering all of 

synonyms in the different languages could lead to incorrect results. The word for "network" in 
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Spanish is "red" and a searph qjj "r\pwQxy vyfypl} Wildly translates the search term would 
incorrectly find English documents whjcfj include the color "red". 

Further, some of the docui^p^§ if}plu4e text in one language and key words presented in 
a different language to avoid changing the meaning. Thus, it is desirable to search a database 
5 which includes these terms but would not be effective to search only for the translated form of 
the word. 

As will be apparent to one skilled in the relevant art, the process of translating and 
searching in multiple languages can consume substantial computing resources. Many of the 
multi-language database searching techniques require a powerful computer or take an inordinate 
0 amount of time to process a single search, the amount depending on the size of the database, the 
number of supported languages and the nature of the queries. However, the computing resources 
have a cost associated with them, either in requiring a larger or faster system or in terms of tying 
up the computer while a large task is running to the exclusion of other users. Further, a search 

I 

i 

which takes a long period of time may prevent the user from interactively modifying the search 
1 5 to obtain meaningful results. Accordingly, it is desirable to avoid using large computing 
resources. 

Accordingly, existing systems methods for searching databases have undesirable 
disadvantages and limitations which will be apparent to those skilled in the art in view of the 
following description of the present invention. 
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Summary of the Invention 

The present invention overcomes the disadvantages and limitations of the prior art 
systems by providing a simple, yet effective, method and system for searching a database 
including documents in multiple supported languages. The present invention also supports 
5 searching a database in which the text is comprised of documents written in multiple languages, 
including those documents which are written in one language but which include words or 
phrases from a second language. 

The present invention has the advantage that a translation of the documents in the 
P j database into each of the supported languages is not required. 

IJilO The present invention also has the advantage that the meaning of the original document is 

W not lost or distorted through a translation process to allow searching of the document in different 

81 

9 languages. 

m 

ff j The present invention also allows for the searching of a database in a native or natural 

S 

M' language while finding documents which are written in other languages. 

r 5 ? ? 

M 

^ 15 Other objects and advantages of the system and method of the present invention will be 

apparent to those skilled in the relevant art, in view of the following description of the preferred 
embodiment, taken together with the accompanying drawings and the appended claims. 

20 Brief Description of the Drawings 

Having thus described some of the objects and advantages of the present invention, other 
objects and advantages will be apparent to those skilled in the art in view of the following 
description of the invention taken in conjunction with the accompanying drawings in which: 
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Fig. 1 is a diagrammatic view of a traditional search technique in which documents exist 
in two different languages; 

Fig. 2 is a diagrammatic view of a diagram of an improved multi-lingual document 
database index system of the present invention; 
5 Fig. 3 is a dual language (or multi-language) database search system of the present 

invention; 

Fig. 4 is a flow chart illustrating sample logic performed in practicing the present 
invention; and 

Fig. 5 is a synonym table of the type which is useful in carrying out the present invention 
0 as described in connection with Figs. 2-4, associating a word in one language with its counterpart 
in another language. 



I Detailed Description of the Preferred Embodiment 

!l5 

In the following description of the preferred embodiment, the best implementation of 
practicing the invention presently known to the inventor will be described with some 
particularity. However, this description is intended as a broad, general teaching of the concepts 
of the present invention describing a specific embodiment but is not intended to be limiting the 
20 present invention to that as shown in this embodiment, especially since those skilled in the 
relevant art will recognize many variations and changes to the specific structure and operation 
shown and described with respect to these figures. 

Fig. 1 illustrates a traditional search system, that is, one of the prior art, in which 

documents in English (a first language) are represented by the symbol 102 and documents in a 
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second language such as a national language (NL) are represented by the symbol 122. While 
each set of documents is maintained separately, each is indexed through a process of extracting 
the keywords and creating an index, represented by the box 104 for the English documents 102 
and the box 124 for the second language documents 122. The next step is that an inverted index 
is performed for each set of documents, the English inverted index at block 106 and the second 
language index represented by block 126. Then, a search or query is formatted and applied 
against a selected one of the databases, represented by an English query at 108 and a national 
language query at block 128. The results of the English query are shown by block 1 10 and the 
results of a national language query are represented by the box 130. Thus, the steps of the 
process are carried out separately for each database and including indexing the document at 
block 1 12, creating an inverted index at block 114 and conducting a search and providing an 
output at block 1 16. While the steps are the same regardless of which type of database is used, 
each database is kept separate and each is searched separately and each generates separate 
results. Since this same structure could be applied to any number of separate databases, this 
system could expand to support the number of languages desired. 

However, some technical documents are written in a native language (such as Spanish) 
but use technical terms from another language (for example, from English). In such a system, 
searching the national language database for the national language equi valent of a search term 
will not find the search term if it is included in the document in another language. 

Fig. 2 illustrates a system for merging documents in different languages into a single 
index. As shown in this Figure, documents in a first language (English) are represented by the 
symbol 202 and documents in a second language (a national language) are represented by the 
symbol 204. Keywords are identified from each document in a step 206, then translated into 
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each supported language at block 208. Separate indices 210, 212 in each language are created 
from the translated keywords. Then, an inverted index 214 is created from the translated 
keywords. The translation of keywords is preferably accomplished using a keyword dictionary 
220 which included words in English associated with the corresponding keywords in the national 
language (and vice versa) to form a synonym listing which effectively translates a keyword in 
one language into the corresponding term in another language (and vice versa). This listing of 
synonyms accomplishes the translation of keywords in the creation of the indices and for later 
searching as will be described in connection with Figure 3. In order to manage various 
languages, it is proposed to translate each term using the Unicode system (UTF8), although any 
other system which is accurate and consistent could also be used to advantage in the present 
invention. 

Thus, the process of creating an inverted index involves steps of creating in block 232 an 
index in each language and in creating a merged inverted index in block 234 using the keyword 
dictionary 220 which includes synonyms in each supported language. While two languages are 
shown in the figures of the present invention, the present invention can easily be expanded to 
support the desired number of languages, and, while English is described as one language for the 
documents and for the searches, the present invention is not limited to serving documents in 
English and another language could be substituted, if desired. 

Fig. 3 illustrates a search system of the type which is useful in the present invention. A 
query is input at block 310 then passed to a keyword dictionary represented by block 320. The 
keyword dictionary 320 includes a bi-directional translation system which translates keywords 
from the English (or first) language 322 to a national (or second) language 324 and vice versa, 
using, in its preferred embodiment, a stored synonym list in the form of a bi-directional table 
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such as is illustrated and described later, particularly in connection with Fig. 5. The synonym 
table is designed to support a plurality of languages and allow translation between the supported 
languages. The result is a pair of queries, one query 330 in the first language (e.g., English) and a 
second query 340 in a second language (such as the national language). The English language 
query 330 is applied against both the English inverted index 334 and the national language index 
334, and the national language query 340 is applied against the national language index 344, and 
generate results: an English-language hitlist 338 and a national language hitlist 348. The user 
then can select (represented by the box 350) which results are of interest to the user, at least to 
start the process, since it is possible that the user will select one, determine that it is 
inappropriate and try another selection. If the user has limited capabilities in understanding 
English, he may prefer to look at the results 348 in the national language. If the national 
language results 348 are not sufficient (or nonexistent), then he may go on to the English 
language results 338. In the alternative, the user may recognize that the results of interest are 
most likely to be the English results 338 and may start with those results. In another alternative, 
the user finds so many results in English that he decides to review the more selective list in his 
national language. 

Fig. 4 illustrates a flow chart of one process of practicing the present invention. As shown 
in this Fig. 4, the process begins with a determination of the language of the user and whether the 
user wishes to limit his universe to documents written in his native language. The first step is to 
determine the language of the user at block 410. Perhaps the user has entered his native or 
national language or perhaps it is determined from his entries, such as a query in a given 
language. Then, at block 420 th$ peters the query jp ^rps of j^y^vords. Those keywords 
are translated at block 430 and the queries produced are submitted to the searching mechanism 
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at block 440. Results are obtained at the block 450 and a set of results may be selected at block 
460. 

In Fig. 5, a portion of synonym table is shown by the reference numeral 500. The table 
includes a plurality of columns, each associated with a different language. In the Fig. 5 as shown, 
5 these supported languages are English in column 510, Spanish in column 520, French in column 
530 and Italian in column 540. An additional column 550 is shown provided for another 
language such as German or Japanese, recognizing, of course, that some languages have 
different type of characters from English and some languages have so many different symbols 
* that it may be necessary to use a double byte character set to represent some of such languages 

jlO like Japanese. Two sets of synonyms are shown in rows in this Fig. 5, one associated with the 

I 

I English word "network" in row 560 and one associated with the English word "processor" in row 
570. In practice, the synonym table 500 may have additional columns as desired as shown by the 
symbol 590 (or may have fewer columns if fewer languages are supported and the selection of 
supported languages is a matter of design choice and not a feature of the present invention) and 
]15 will have a row for each keyword, shown by the symbol 580. It is important to note that each 
entry is associated with a language so that it is possible to associate a word with its language and 
distinguish between the Spanish word for network (red) from the English word for the color red, 
if desired While the table is shown in tabular form for ease in understanding the concept of a 
synonym table, the table may exist in other known formats in storage according to conventional 
20 data processing techniques.. 

The present invention, it will be recognized, is especially adapted for use in a data 
processing system such as a general purpose computer with a stored program containing 
computer program means including a plurality of instructions. Those instructions will generally 
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be written in a high level language which is readable by a human and translated into machine 
language, that is, simple instructions which are understood by the data processing system. In an 
appropriate instance such instructions could be directly written in a machine language 
programming language, if desired, a system which allows for efficiency of execution but which 
5 is more difficult to program. The present invention is not limited to any particular input 
language. 

As used in the present document, software, computer program and computer program 
means are used interchangeably. Software in the present context means any expression, in any 
M language, code or notation, of a set of instructions intended to cause a system having an 

Q 10 information processing capability to perform a particular function either directly or after either 

m 

J or both of the following a) conversion to another language, code or notation; b) reproduction in a 

s f.» 

Ql different material form. The use of the Unicode system for managing different languages has 

p been used in description of the preferred embodiment but other suitable methods for 

ill 

2 representing different languages could also be used to advantage in the present invention, if 

h 

;";15 desired. 

The term national language has been used to represent a language associated with a user 
of the system. This language could be any language supported by the system, and might include 
different languages for different users. So, "national language" might represent Spanish for a 
Mexican or a person from Spain and might represent French for a person from France or other 
20 French-speaking locales. Appropriate synonym tables are available for a variety of common 
languages as are systems for locating key words and separating common text with little 
uniqueness from key words which are descriptive of the document under consideration. Such key 
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word locating systems are often technologically directed and identify words which are of interest 
to the technology under consideration. 

Of course, many modifications of the present invention will be apparent to those skilled 
in the relevant art in view of the foregoing description of the preferred embodiment, taken 
5 together with the accompanying drawings and the appended claims. For example, the present 
invention has been described in connection with documents and searches in English and in a 
national language whereas the number of supported languages need not be 2 and need not be a 
single national language. Further, in some circumstances, the documents could be written in a 
combination of supported languages. Additionally, some elements of the present invention can 

ft 

|j 10 be used to advantage without the corresponding use of other elements. For example, the use of 

01 

yj the synonym or keyword dictionary is not the only way to accomplish the translation of 

fll keywords into other language . Further, various other devices could be substituted to advantage 

J; depending on the environmental circumstances. Accordingly, the foregoing description of the 

CI 

j£j preferred embodiment should be considered as merely illustrative of the principles of the present 

£1 

f I j 1 5 invention and not in limitation thereof 
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